function estimate
Counterfactual Survival Q Learning for Longitudinal Randomized Trials via Buckley James Boosting
We propose a Buckley James (BJ) Boost Q learning framework for estimating optimal dynamic treatment regimes under right censored survival data, tailored for longitudinal randomized clinical trial settings. The method integrates accelerated failure time models with iterative boosting techniques, including componentwise least squares and regression trees, within a counterfactual Q learning framework. By directly modeling conditional survival time, BJ Boost Q learning avoids the restrictive proportional hazards assumption and enables unbiased estimation of stage specific Q functions. Grounded in potential outcomes, this framework ensures identifiability of the optimal treatment regime under standard causal assumptions. Compared to Cox based Q learning, which relies on hazard modeling and may suffer from bias under misspecification, our approach provides robust and flexible estimation. Simulation studies and analysis of the ACTG175 HIV trial demonstrate that BJ Boost Q learning yields higher accuracy in treatment decision making, especially in multistage settings where bias can accumulate.
An Online Multiple Kernel Parallelizable Learning Scheme
Ruiz-Moreno, Emilio, Beferull-Lozano, Baltasar
The performance of reproducing kernel Hilbert space-based methods is known to be sensitive to the choice of the reproducing kernel. Choosing an adequate reproducing kernel can be challenging and computationally demanding, especially in data-rich tasks without prior information about the solution domain. In this paper, we propose a learning scheme that scalably combines several single kernel-based online methods to reduce the kernel-selection bias. The proposed learning scheme applies to any task formulated as a regularized empirical risk minimization convex problem. More specifically, our learning scheme is based on a multi-kernel learning formulation that can be applied to widen any single-kernel solution space, thus increasing the possibility of finding higher-performance solutions. In addition, it is parallelizable, allowing for the distribution of the computational load across different computing units. We show experimentally that the proposed learning scheme outperforms the combined single-kernel online methods separately in terms of the cumulative regularized least squares cost metric.
Automatic Illumination Spectrum Recovery
Habili, Nariman, Oorloff, Jeremy, Petersson, Lars
We develop a deep learning network to estimate the illumination spectrum of hyperspectral images under various lighting conditions. To this end, a dataset, IllumNet, was created. Images were captured using a Specim IQ camera under various illumination conditions, both indoor and outdoor. Outdoor images were captured in sunny, overcast, and shady conditions and at different times of the day. For indoor images, halogen and LED light sources were used, as well as mixed light sources, mainly halogen or LED and fluorescent. The ResNet18 network was employed in this study, but with the 2D kernel changed to a 3D kernel to suit the spectral nature of the data. As well as fitting the actual illumination spectrum well, the predicted illumination spectrum should also be smooth, and this is achieved by the cubic smoothing spline error cost function. Experimental results indicate that the trained model can infer an accurate estimate of the illumination spectrum.
Selective Uncertainty Propagation in Offline RL
Krishnamurthy, Sanath Kumar, Gangwani, Tanmay, Katariya, Sumeet, Kveton, Branislav, Rangi, Anshuka
We study the finite-horizon offline reinforcement learning (RL) problem. Since actions at any state can affect next-state distributions, the related distributional shift challenges can make this problem far more statistically complex than offline policy learning for a finite sequence of stochastic contextual bandit environments. We formalize this insight by showing that the statistical hardness of offline RL instances can be measured by estimating the size of actions' impact on next-state distributions. Furthermore, this estimated impact allows us to propagate just enough value function uncertainty from future steps to avoid model exploitation, enabling us to develop algorithms that improve upon traditional pessimistic approaches for offline RL on statistically simple instances. Our approach is supported by theory and simulations.
Bias and Variance
When trying to fit a machine learning model to a data, we would have faced scenarios where the model will work fine on the training data, but will fail miserably on the test data. In machine learning terms we will call that overfitting and it will be specified by using terms like bias and variance. Though most of us has got the understanding of what overfitting means, we still don't get the full picture of what is happening in the background. So, here I will give a clear picture of what is happening behind the scenes of bias and variance. First lets talk about target function, What is a target function?
Rejoinder: Learning Optimal Distributionally Robust Individualized Treatment Rules
Mo, Weibin, Qi, Zhengling, Liu, Yufeng
We thank the opportunity offered by editors for this discussion and the discussants for their insightful comments and thoughtful contributions. We also want to congratulate Kallus (2020) for his inspiring work in improving the efficiency of policy learning by retargeting. Motivated from the discussion in Dukes and Vansteelandt (2020), we first point out interesting connections and distinctions between our work and Kallus (2020) in Section 1. In particular, the assumptions and sources of variation for consideration in these two papers lead to different research problems with different scopes and focuses. In Section 2, following the discussions in Li et al. (2020); Liang and Zhao (2020), we also consider the efficient policy evaluation problem when we have some data from the testing distribution available at the training stage. We show that under the assumption that the sample sizes from training and testing are growing in the same order, efficient value function estimates can deliver competitive performance. We further show some connections of these estimates with existing literature. However, when the growth of testing sample size available for training is in a slower order, efficient value function estimates may not perform well anymore. In contrast, the requirement of the testing sample size for DRITR is not as strong as that of efficient policy evaluation using the combined data. Finally, we highlight the general applicability and usefulness of DRITR in Section 3.
Fully Nonparametric Bayesian Additive Regression Trees
George, Edward, Laud, Prakash, Logan, Brent, McCulloch, Robert, Sparapani, Rodney
Bayesian Additive Regression Trees (BART) is a fully Bayesian approach to modeling with ensembles of trees. BART can uncover complex regression functions with high dimensional regressors in a fairly automatic way and provide Bayesian quantification of the uncertainty through the posterior. However, BART assumes IID normal errors. This strong parametric assumption can lead to misleading inference and uncertainty quantification. In this paper, we use the classic Dirichlet process mixture (DPM) mechanism to nonparametrically model the error distribution. A key strength of BART is that default prior settings work reasonably well in a variety of problems. The challenge in extending BART is to choose the parameters of the DPM so that the strengths of the standard BART approach is not lost when the errors are close to normal, but the DPM has the ability to adapt to non-normal errors.
High-dimensional additive modeling
Meier, Lukas, van de Geer, Sara, Bühlmann, Peter
We propose a new sparsity-smoothness penalty for high-dimensional generalized additive models. The combination of sparsity and smoothness is crucial for mathematical theory as well as performance for finite-sample data. We present a computationally efficient algorithm, with provable numerical convergence properties, for optimizing the penalized likelihood. Furthermore, we provide oracle results which yield asymptotic optimality of our estimator for high dimensional but sparse additive models. Finally, an adaptive version of our sparsity-smoothness penalized approach yields large additional performance gains.